Proceedings of the 6 th International Workshop on Systems Software Verification ( SSV 2011 ) Jörg Brauer Marco Roveri Hendrik
نویسندگان
چکیده
We address the issue of recovering a both safe and precise approximation of the Control Flow Graph (CFG) of a program given as an executable file. The problem is tackled in an original way, with a refinement-based static analysis working over finite sets of constant values. Requirement propagation allows the analysis to automatically adjust the domain precision only where it is needed, resulting in precise CFG recovery at moderate cost. First experiments, including an industrial case study, show that the method outperforms standard analyses in terms of precision, efficiency or robustness. Motivation. Automatic analysis of programs from their executable files has many potential applications in safety and security, for example: automatic analysis of mobile code and malware, security testing or worst case execution time estimation. We address the problem of (safe) CFG reconstruction, i.e. constructing a both safe and precise approximation of the Control Flow Graph (CFG) of a program given as an executable file. CFG reconstruction is a cornerstone of safe binary-level analysis: if the recovery is unsafe, subsequent analyses will be unsafe too; if it is too rough, they will be blurred by too many unfeasible branches and instructions. Fig. 1. CFG reconstruction from an executable file Challenges. Such an approximation is difficult to obtain mainly because of dynamic jumps, i.e. jump instructions whose target expression is resolved at run-time and may ⋆ Work partially funded by ANR (grants ANR-05-RNTL-02606 and ANR-08-SEGI-006). ⋆⋆ The material presented here is taken from a preliminary version of the VMCAI’11 paper [3]. 2 Sébastien Bardin, Philippe Herrmann, and Franck Védrine vary from one execution to the other. Dynamic jumps are very sensitive instructions and a small loss in precision on target expressions may affect dramatically the quality of the subsequent analysis, leading to vicious circles between value analysis and CFG reconstruction. Moreover, there is no reason why all valid targets of a dynamic jump should follow a nice regular pattern. Indeed they are just addresses in the executable code, often arbitrarily assigned by a compiler. Hence any analysis based on popular domains (i.e. convex domains possibly enhanced with congruence information) will introduce many false targets. For example, consider an instruction cgoto(x) with x ∈ {1355, 1356, 2126}: such an analysis cannot recover better than x ∈ [1355..2126], reporting 99% of false targets. Note that, unfortunately, dynamic jumps are ubiquitous in native code programs: they are introduced at compile-time either for efficiency (switch in C) or by necessity (return statements, function pointers in C, virtual methods in C++, etc.). Related approaches. Industrial tools like IDA PRO [10] or AIT [9] usually rely on linear sweep decoding (brute force decoding of all code addresses) or recursive traversal (recursive decoding until a dynamic jump is encountered), enhanced with limited constant propagation, pattern matching techniques based on the knowledge of the compiling chain process and user annotations. These techniques are unsafe on general programs, missing many legal targets and branches. The only safe techniques are those by Reps et al. [4, 5] based mainly on stride intervals propagation, and by Kinder and Veith [7, 8] based on k-set (sets of bounded cardinality) propagation. Experiments reported by the authors show that while each approach performs much better than current industrial tools, both techniques still recover many false targets. Especially, stride intervals cannot capture precisely sets of jump targets, and k-sets are too sensitive to their cardinality bound, potentially leading to either imprecise or expensive analyses. Our approach. We propose an original refinement-based procedure to solve CFG reconstruction [3]. The procedure is built on two main steps: a forward k-set propagation with local cardinality bounds (ranging from 0 up to a given parameter Kmax), and a refinement step controlling these cardinality bounds. The forward propagation is mostly a standard one, enhanced with a few original mechanisms: (1) abstract values are downcast according to local cardinality bounds, permitting to lose information and increase efficiency; (2) ⊤ values (i.e. abstract values denoting the whole domain) are tagged with additional information recording their origin (for example ⊤〈1,3,12〉 denotes the abstraction to ⊤ of the k-set {1, 3, 12}), allowing to pinpoint the initial sources of precision loss (ispl) and give clue for correction (cf. refinement); (3) alias, jump targets and branches that have been fired during propagation are recorded into a journal (cf. refinement). Refinement is lazy and on-demand. When a jump expression evaluates to ⊤, the refinement mechanism takes place, trying to find out ispls responsible for the violation (guided by backward data dependencies and journal information) and to correct them by locally improving the domain precisions (using ⊤-flags). Results. From a theoretical point of view, the procedure is sound and runs in polynomialtime. Moreover it is as precise as standard k-set propagation on a class of non-trivial programs, including dynamic jumps and alias [3]. From a practical point of view, the Refinement-based CFG Reconstruction from Executables 3 procedure has been implemented and evaluated on an industrial safety-critical program (32 kloc) and on small handcrafted programs. It appears to be reasonably efficient (taking less than 5 minutes for the industrial case study), very precise (only 7% of false targets, beating standard approaches based on convex domains by several orders of magnitude), and very robust: the procedure does need an initial parameter, but its exact value does not seem to matter.
منابع مشابه
6 th International Workshop on Systems Software Verification
This paper examines a novel strategy for developing correctness proofs in interactive software verification for C programs. Rather than proceeding backwards from the generated verification conditions, we start by developing a library of the employed data structures and related coding idioms. The application of that library then leads to correctness proofs that reflect informal arguments about t...
متن کامل5 th Workshop on Software and Usability Engineering Cross-Pollination: Patterns, Usability and User Experience
The workshop focuses on how process models, methods and knowledge from the area of Human-Computer Interaction can be integrated and adopted to support and enhance traditional software engineering processes. In its 5 edition this workshop will investigate the application of usability engineering methods that are adapted to fit the evaluation of advanced interfaces and how usability and user expe...
متن کاملSymbolic Model Checking and Safety Assessment of Altarica models
Altarica is a language used to describe critical systems. In this paper we present a novel approach to the analysis of Altarica models, based on a translation into an extended version of NuSMV. This approach opens up the possibility to carry out functional verification and safety assessment with symbolic techniques. An experimental evaluation on a set of industrial case studies demonstrates the...
متن کاملProceedings of the 5 th International Workshop on Critical Systems Development Using Modeling Languages ( CSDUML 2006 )
The proceedings present the accepted contributions for the 5 International Workshop on Critical Systems Development Using Modeling Languages (CSDUML’06). CSDUML’06 takes place on October 1, 2006, in Genova, Italy, and is organised in conjunction with MoDELS’06 (October 1 – 6, 2006). The papers represent research in four areas: specification and analysis, system synthesis, verification, and indu...
متن کاملAn Analytic Evaluation of SystemC Encodings in Promela
SystemC is a de-facto standard language for high-level modeling of systems on chip. We investigate the feasibility of explicit state model checking of SystemC programs, proposing several ways to convert SystemC into Promela. We analyze the expressiveness of the various encoding styles, and we experimentally evaluate their impact on the search carried out by SPIN on a significant set of benchmar...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011